disclaimer: To ensure that the notebook can be run from (more or less) any point, I try to load the relevant functions or modules whenever I use them in a cell. This is generally not good practice as it adds unneccesary overhead

0. Image representation as numerical arrays

We start by importing numpy and creating and printing a simple 9x9 checkerboard array

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload
import numpy as np

checkBoard = np.zeros((9,9))
checkBoard[0::2, 1::2] = 1
checkBoard[1::2, 0::2] = 1

print(checkBoard)
[[0. 1. 0. 1. 0. 1. 0. 1. 0.]
 [1. 0. 1. 0. 1. 0. 1. 0. 1.]
 [0. 1. 0. 1. 0. 1. 0. 1. 0.]
 [1. 0. 1. 0. 1. 0. 1. 0. 1.]
 [0. 1. 0. 1. 0. 1. 0. 1. 0.]
 [1. 0. 1. 0. 1. 0. 1. 0. 1.]
 [0. 1. 0. 1. 0. 1. 0. 1. 0.]
 [1. 0. 1. 0. 1. 0. 1. 0. 1.]
 [0. 1. 0. 1. 0. 1. 0. 1. 0.]]

Then, we import pyplot and image modules from the ploting library matplotlib. Using it, we can display our checkerboard array in image form:

In [2]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
plt.imshow(checkBoard, cmap='gray', interpolation='nearest') 
plt.show()

As another simple example, we will import the data module image processing library scikit-image and load a small image of a bush.

First, we want to print the pixel values:

In [3]:
from skimage import data
image_of_a_bush = data.lfw_subset()
image_of_a_bush = image_of_a_bush[0,:,:]

#print the #of dimentions, the shape, and the pixel values of the image
print("The number of dimensions of the image is: ", image_of_a_bush.ndim)
print("The size of the image is: ", image_of_a_bush.shape)
print(image_of_a_bush)
The number of dimensions of the image is:  2
The size of the image is:  (25, 25)
[[0.28888887 0.32941177 0.38039216 0.50326794 0.47973856 0.50457519
  0.55947715 0.54901963 0.56732029 0.57516342 0.59738559 0.61307186
  0.59607846 0.56078434 0.54248363 0.49281046 0.45359477 0.44183007
  0.2522876  0.23529412 0.4261438  0.49673203 0.72418302 0.69934636
  0.43529412]
 [0.29411766 0.33333334 0.44052288 0.52026147 0.49934641 0.53464049
  0.55424833 0.59869283 0.6156863  0.61960787 0.620915   0.63660127
  0.62875813 0.6326797  0.59215689 0.52418303 0.47581699 0.44705883
  0.34640524 0.32287583 0.33202612 0.59738559 0.81830066 0.77777773
  0.27843139]
 [0.35294119 0.41568628 0.42745098 0.48104575 0.50457519 0.52287579
  0.53594774 0.63529414 0.65359479 0.620915   0.61830068 0.64444441
  0.61830068 0.61830068 0.60000002 0.53856206 0.46405229 0.43137255
  0.37254903 0.37908494 0.29411766 0.50065356 0.60915029 0.53856206
  0.36732024]
 [0.41176471 0.43006536 0.49542484 0.48627451 0.50980395 0.52679735
  0.53071892 0.60915029 0.57516342 0.59869283 0.63006538 0.64705884
  0.63921571 0.63398695 0.59738559 0.54248363 0.46013072 0.41699347
  0.39607844 0.30718955 0.26274511 0.48627451 0.26143789 0.1856209
  0.19477125]
 [0.43137255 0.49673203 0.47712418 0.46797386 0.48627451 0.53071892
  0.5411765  0.60130715 0.63660127 0.61830068 0.59869283 0.61437911
  0.63006538 0.61699343 0.5581699  0.5281046  0.47973856 0.45228758
  0.4130719  0.3594771  0.23921569 0.36993465 0.1856209  0.16732027
  0.19084968]
 [0.48104575 0.47320262 0.39869279 0.43921569 0.47973856 0.54248363
  0.55032676 0.58954245 0.62614381 0.61960787 0.61960787 0.62222224
  0.63137257 0.620915   0.57908499 0.54771245 0.49542484 0.43137255
  0.44052288 0.27712417 0.22222222 0.41437906 0.1751634  0.1751634
  0.19084968]
 [0.42483661 0.42222223 0.35816994 0.46797386 0.50326794 0.51503265
  0.43529412 0.49019608 0.53725493 0.49803922 0.47189543 0.55947715
  0.56209147 0.55424833 0.42091504 0.36862746 0.28627452 0.29934642
  0.34640524 0.28627452 0.32549021 0.31895426 0.18431373 0.18431373
  0.20261438]
 [0.36862746 0.31764707 0.4627451  0.49803922 0.48235294 0.4379085
  0.43921569 0.38039216 0.3019608  0.31111112 0.40915033 0.44836602
  0.49542484 0.47189543 0.33725491 0.28104573 0.41437906 0.41437906
  0.35032681 0.43267974 0.37385622 0.31764707 0.17908497 0.19869281
  0.19869281]
 [0.20784314 0.29411766 0.50718951 0.48104575 0.47450981 0.45228758
  0.47189543 0.42745098 0.28235295 0.27189544 0.28888887 0.45751634
  0.61045754 0.34509805 0.25098041 0.17908497 0.16078432 0.2
  0.43006536 0.47712418 0.29934642 0.59346402 0.41568628 0.31503269
  0.1738562 ]
 [0.26274511 0.27973858 0.46928105 0.50718951 0.56601304 0.49803922
  0.4509804  0.55163401 0.31111112 0.52287579 0.52549022 0.52026147
  0.59346402 0.42091504 0.43921569 0.50065356 0.46535948 0.5281046
  0.47320262 0.45228758 0.21699347 0.81045753 0.74640518 0.35424837
  0.15294118]
 [0.30326799 0.4130719  0.45882353 0.50588238 0.58562088 0.57124186
  0.57516342 0.59607846 0.53986931 0.5411765  0.58300656 0.48366013
  0.55555558 0.44444445 0.47320262 0.52026147 0.52026147 0.49019608
  0.53856206 0.47058824 0.41045749 0.80130714 0.71241832 0.33464053
  0.1633987 ]
 [0.42091504 0.40522876 0.43529412 0.49150327 0.55555558 0.60130715
  0.61045754 0.63006538 0.61176473 0.65620911 0.5281046  0.53333336
  0.56601304 0.48496732 0.46143791 0.5281046  0.56601304 0.55555558
  0.48235294 0.43921569 0.47581699 0.50326794 0.35555553 0.28888887
  0.20130718]
 [0.53725493 0.43137255 0.4627451  0.47843137 0.50065356 0.57124186
  0.64444441 0.627451   0.64836597 0.64052284 0.50457519 0.5411765
  0.63660127 0.55555558 0.44967321 0.48496732 0.61437911 0.55686277
  0.44183007 0.42091504 0.42483661 0.19738561 0.21568628 0.22875817
  0.22875817]
 [0.42483661 0.44313726 0.44444445 0.45620915 0.49019608 0.50588238
  0.57908499 0.60130715 0.6026144  0.53071892 0.54509807 0.55163401
  0.56078434 0.61176473 0.43398693 0.41437906 0.48235294 0.49803922
  0.42091504 0.41045749 0.38954249 0.19084968 0.21960784 0.22614379
  0.23529412]
 [0.46797386 0.43529412 0.45228758 0.46797386 0.50457519 0.4875817
  0.52026147 0.53594774 0.50196081 0.47320262 0.39477122 0.29803923
  0.49411765 0.49281046 0.26797387 0.40784314 0.38039216 0.44705883
  0.38954249 0.39869279 0.20261438 0.18300654 0.20915033 0.22875817
  0.23529412]
 [0.4261438  0.39607844 0.44575164 0.46143791 0.51241833 0.47450981
  0.49019608 0.49673203 0.4509804  0.5581699  0.58039218 0.54901963
  0.40653592 0.40261436 0.41960785 0.4261438  0.35686275 0.3764706
  0.37516338 0.4130719  0.1751634  0.17777777 0.2130719  0.21699347
  0.23137255]
 [0.37124181 0.37516338 0.41437906 0.44705883 0.49673203 0.49019608
  0.50326794 0.51503265 0.50718951 0.50980395 0.58562088 0.60392159
  0.55032676 0.55163401 0.47320262 0.41045749 0.37254903 0.4013072
  0.4130719  0.36732024 0.4130719  0.30718955 0.18692811 0.30849671
  0.22614379]
 [0.13986929 0.08888888 0.38562092 0.44183007 0.46535948 0.52287579
  0.53594774 0.50588238 0.48104575 0.46928105 0.49019608 0.54771245
  0.53725493 0.54901963 0.46666667 0.39084965 0.3764706  0.4261438
  0.37908494 0.20392157 0.08627451 0.09019608 0.1124183  0.0627451
  0.22745098]
 [0.10980392 0.09934641 0.32810456 0.41960785 0.4627451  0.50065356
  0.5464052  0.50326794 0.48104575 0.36601308 0.16470589 0.09019608
  0.11503268 0.10980392 0.06535947 0.2496732  0.41960785 0.43398693
  0.37908494 0.08627451 0.09281045 0.06797386 0.075817   0.03660131
  0.06797386]
 [0.09673202 0.44313726 0.49934641 0.40000001 0.46535948 0.47581699
  0.51111108 0.53333336 0.4875817  0.15163399 0.13202615 0.16732027
  0.21437909 0.21176471 0.16601306 0.32287583 0.39477122 0.43529412
  0.37777779 0.09019608 0.07843138 0.0875817  0.06797386 0.04836601
  0.075817  ]
 [0.10588235 0.44967321 0.67843139 0.35686275 0.41437906 0.42745098
  0.49542484 0.49542484 0.52026147 0.56862748 0.56862748 0.44967321
  0.44183007 0.52418303 0.43137255 0.40784314 0.49934641 0.40522876
  0.15163399 0.10196079 0.07189543 0.06797386 0.07189543 0.06013072
  0.07189543]
 [0.10326798 0.26013073 0.84444445 0.37385622 0.37124181 0.41176471
  0.46797386 0.5411765  0.54509807 0.51895422 0.5281046  0.55163401
  0.53986931 0.55163401 0.41176471 0.41437906 0.47712418 0.35816994
  0.07712418 0.0875817  0.07843138 0.06013072 0.07189543 0.07189543
  0.06797386]
 [0.10980392 0.10588235 0.80653596 0.77516341 0.36732024 0.39346406
  0.42222223 0.53202617 0.57908499 0.52156866 0.47973856 0.44444445
  0.39477122 0.38954249 0.35816994 0.42745098 0.43137255 0.14509805
  0.08366013 0.08627451 0.075817   0.06013072 0.05620915 0.07189543
  0.075817  ]
 [0.10588235 0.11372549 0.81307185 0.80000001 0.7647059  0.38039216
  0.39215687 0.46405229 0.58300656 0.56732029 0.53986931 0.50849676
  0.49673203 0.46143791 0.42745098 0.48235294 0.30980393 0.0875817
  0.07320261 0.08366013 0.0875817  0.06405229 0.04444444 0.06013072
  0.07189543]
 [0.09150327 0.0875817  0.66405225 0.84052289 0.8143791  0.82222223
  0.37777779 0.36862746 0.52679735 0.60130715 0.59477127 0.57385617
  0.55686277 0.54248363 0.51764709 0.45620915 0.16732027 0.08235294
  0.07058824 0.06797386 0.07450981 0.06013072 0.05620915 0.06405229
  0.06797386]]

Can you see the bush?

Next, show the image:

In [4]:
plt.figure(figsize=(1,1))
plt.imshow(image_of_a_bush, cmap='gray', interpolation='nearest') 
plt.show()

1. Pixel-level operations

Now that we have a sense of what a digital image is, let's start manipulating it. We'll begin with simple pixel-level operations

1.1 Basic pixel-level operations

Let's look at a more interesting image. From scikit-image data we'll open a example IHC image, and plot it using pyplot.

In [4]:
from skimage import data
import matplotlib.pyplot as plt
import numpy as np

image_hist = data.immunohistochemistry()
#check the size of the image
print("The number of dimensions of the image is: ", image_hist.ndim)
print("The size of the image is: ", image_hist.shape)
plt.imshow(image_hist, cmap=plt.cm.gray)
The number of dimensions of the image is:  3
The size of the image is:  (512, 512, 3)
Out[4]:
<matplotlib.image.AxesImage at 0x1c24e0ab10>

Seems like we have an RGB image. Let's look at every channel independently.

In [5]:
plt.figure(figsize=(15,5))
plt.subplot(131)
plt.gca().set_title('Red channel')
plt.imshow(image_hist[:,:,0], cmap='Reds', interpolation='nearest')
plt.subplot(132)
plt.gca().set_title('Green channel')
plt.imshow(image_hist[:,:,1], cmap='Greens', interpolation='nearest')
plt.subplot(133)
plt.gca().set_title('Blue channel')
plt.imshow(image_hist[:,:,2], cmap='Blues', interpolation='nearest')

plt.show()
In [6]:
#for the moment let's look at only the first color channel
image_hist = image_hist[:,:,0]
plt.gca().set_title('First channel')
plt.imshow(image_hist, cmap=plt.cm.gray)
Out[6]:
<matplotlib.image.AxesImage at 0x1c257240d0>

We can invert the image using the invert function from scikit-images utilities module:

In [7]:
from skimage.util import invert

inverted_image = invert(image_hist)

plt.figure(figsize=(15,5))
plt.subplot(121)
plt.gca().set_title('original image')
plt.imshow(image_hist, cmap=plt.cm.gray)
plt.subplot(122)
plt.gca().set_title('inverted image')
plt.imshow(inverted_image, cmap=plt.cm.gray)
Out[7]:
<matplotlib.image.AxesImage at 0x1c24f7e210>

Let's try some other pixel-level operations. We'll use the Exposure module from scikit image.

  1. A gamma correction applies the nonlinear transform $V_{out} = V_{in}^\gamma$.

  2. A log transform applies $V_{out} = log(V_{in}+1)$.

  3. A sigmoid transform applies $V_{out} = \frac{1}{1+e^{gain\cdot(\text{cutoff}-V_{in})}}$.

  4. Equalization transforms the intensity histogram of an image to a uniform distribution. It often enhances the contrast of the image

  5. CLAHE works similarly to equalization thats applied separately to different regions of the image.

Try to apply these by calling the relevant function from skimage.exposure, or by direct calculation.

Play with the different parameters and see how they change the output.

In [8]:
from skimage import exposure

# apply gamma scaling with gamma=2
gamma=2
gamma_corrected = exposure.adjust_gamma(image_hist, gamma)

# apply logarithmic scaling
logarithmic_corrected = exposure.adjust_log(image_hist)

# apply sigmoidal scaling with cutoff=0.4
cutoff = 0.4
sigmoid_corrected = exposure.adjust_sigmoid(image_hist, cutoff=cutoff)

# equalize
equalize_corrected = exposure.equalize_hist(image_hist)

# apply Contrast Limited Adaptive Histogram Equalization (CLAHE)
CLHA_corrected = exposure.equalize_adapthist(image_hist)

plt.figure(figsize=(15,10))

plt.subplot(231)
plt.gca().set_title('original')
plt.imshow(image_hist, cmap=plt.cm.gray)

plt.subplot(232)
plt.gca().set_title('gamma corrected')
plt.imshow(gamma_corrected, cmap=plt.cm.gray)

plt.subplot(233)
plt.gca().set_title('log corrected')
plt.imshow(logarithmic_corrected, cmap=plt.cm.gray)

plt.subplot(234)
plt.gca().set_title('sigmoid')
plt.imshow(sigmoid_corrected, cmap=plt.cm.gray)

plt.subplot(235)
plt.gca().set_title('equalized')
plt.imshow(equalize_corrected, cmap=plt.cm.gray)

plt.subplot(236)
plt.gca().set_title('CLHA corrected')
plt.imshow(CLHA_corrected, cmap=plt.cm.gray)
Out[8]:
<matplotlib.image.AxesImage at 0x1c25623a50>

1.2 Image filtering

Spatial filtering is an image processing technique for changing the intensities of a pixel according to the intensities of some neighborhood of pixels.

The Kernel of the filter defines the neighborhood and the weights asigned to each pixel in the neighborhood:

This procedure is formally a convolution and is marked by an asterisk: $I_o = I_i\ast f$.

side note: since a convolution in the spatial domain is equivalent to multiplication in the frequency domain. Sometimes it is more computationally reasonable to calculate these in fourier space.

side side note: filtering can also be performed in the frequency domain by directly removing a set of frequencies from an image.

The kernel can be of any shape/size, it is applied to each pixel in the image, and the output is a new, filtered, image. The output image is often called the response to the given filter.

Example, local average:

Filtering is an incredibly versatile tool with which you can emphasize certain features or remove other features.

Image processing operations implemented with filtering include smoothing, sharpening, and edge enhancement.

To implement different image filters, we will use the filters module from scikit-image

1.2.1 Smoothing

Smoothing, aka low-pass filtering, is used for removing high-frequency noise from images. Most commonly, a gaussian kernel is used, but others (e.g. local mean/median) work too. We'll see the effect of gaussian filtering.

Try to change the value of sigma (width of the gaussian) to see how the output changes.

In [9]:
import matplotlib.pyplot as plt
import numpy as np

from skimage import filters

image_hist = data.immunohistochemistry()

sigma = 2

gauss_filtered_img = filters.gaussian(image_hist, sigma=sigma, multichannel=True)

plt.figure(figsize=(15,8))
plt.subplot(121)
plt.gca().set_title('original image')
plt.imshow(image_hist, cmap=plt.cm.gray)
plt.subplot(122)
plt.gca().set_title('response, gaussian smoothing')
plt.imshow(gauss_filtered_img, cmap=plt.cm.gray)
Out[9]:
<matplotlib.image.AxesImage at 0x1c25b12a10>

1.2.2 Sharpening

sharpening is sometimes used to enhance a blurry (i.e. crappy) image.

  1. Start with input image
  2. Apply gaussian filter with very narrow kernel
  3. Subtract filtered image from input image to get only high frequency components
  4. Amplify (alpha) and add high frequency components to original input image
In [10]:
filter_blurred_f = filters.gaussian(gauss_filtered_img, sigma=0.5, multichannel=False)
alpha = 3
sharpened = gauss_filtered_img + alpha * (gauss_filtered_img - filter_blurred_f)


plt.figure(figsize=(15,8))
plt.subplot(121)
plt.gca().set_title('input - blury image')
plt.imshow(gauss_filtered_img, cmap=plt.cm.gray)
plt.subplot(122)
plt.gca().set_title('sharpened')
plt.imshow(sharpened, cmap=plt.cm.gray)
Out[10]:
<matplotlib.image.AxesImage at 0x1c25defc50>

1.2.3 Edge enhancement

Edge detecting filters work by measuring the local spatial gradient of an image. Common types are the Sobel, Prewitt and Roberts.

The filters are usually applied to each direction individually and then the total magnitude of the gradient is calculated.

$|\nabla| = \sqrt{\nabla_x^2+\nabla_y^2}$

Sobel:





Prewitt:





Roberts:

In [59]:
from skimage import data
image_hist = data.immunohistochemistry()
image_hist = image_hist[:,:,2]

# sobel magnitude
filtered_img = filters.sobel(image_hist)
# sobel horizontal
filtered_img_h = filters.sobel_h(image_hist)
# sobel vertical
filtered_img_v = filters.sobel_v(image_hist)

plt.figure(figsize=(15,16))
plt.subplot(221)
plt.gca().set_title('input image')
plt.imshow(image_hist, cmap=plt.cm.gray)
plt.subplot(222)
plt.gca().set_title('sobel filter response - magnitude')
plt.imshow(filtered_img, cmap=plt.cm.gray)
plt.subplot(223)
plt.gca().set_title('sobel filter response - horizontal edges')
plt.imshow(np.abs(filtered_img_h), cmap=plt.cm.gray)
plt.subplot(224)
plt.gca().set_title('sobel filter response - vertical edges')
plt.imshow(np.abs(filtered_img_v), cmap=plt.cm.gray)
Out[59]:
<matplotlib.image.AxesImage at 0x1c381b1e10>

Direct application of edge detectors often results in somewhat noisy responses. A way to overcome this is by first smoothing the image with a gaussian filter and then applying the edge filter. The width of the gaussian kernel will determine the size of edges the filter detects.

In [60]:
from skimage import data
from skimage import feature

image_hist = data.immunohistochemistry()
image_hist = image_hist[:,:,2]

# sobel magnitude
filtered_img = filters.sobel(image_hist)

DoG_img = filters.sobel(filters.gaussian(image_hist,sigma=2))

plt.figure(figsize=(15,16))
plt.subplot(221)
plt.gca().set_title('input image')
plt.imshow(image_hist, cmap=plt.cm.gray)

plt.subplot(223)
plt.gca().set_title('sobel filter response - magnitude')
plt.imshow(filtered_img, cmap=plt.cm.gray)
plt.subplot(224)
plt.gca().set_title('canny filter response - magnitude')
plt.imshow(DoG_img, cmap=plt.cm.gray)
Out[60]:
<matplotlib.image.AxesImage at 0x1c285cea10>

DoG is the also the beginning of the canny edge detection algorithm:

1) Apply DoG filter 2) Apply Non-maximum suppression 3) Apply hysteresis thresholding

1.2.4 Gaussian derivatives

Now let's remember that the application of filters is an associative operation (because convolution is linear!). It's equivalent to apply the gaussian and then the gradient filter to the image and to apply the gradient filter to the gaussian, then apply the result to the image, i.e. $\nabla*(G*I) = (\nabla*G)*I$ where $\nabla$ is the gradient (derivative) filter, $G$ is the gaussian filter, and $I$ is the image.

We can generalize this idea and apply the derivative multiple times, to get a family of interesting filters:

$\nabla^n*G$:



Let's use a second order derivative (aka a Laplacian) and make a ridge detector:

We'll look at a retinal photo where vasculature is an interesting feature

We'll invert the images so that the vasculature appears as bright lines on a dark background

In [106]:
import matplotlib.pyplot as plt
import numpy as np
from skimage import img_as_float
import matplotlib.image as mpimg
from skimage.color import rgb2gray
from skimage.util import invert

#this is how we load an image from the hard drive
image_retina = ((img_as_float(mpimg.imread("../Data/RetinalPhoto.png"))))
image_ridges = invert(rgb2gray(image_retina))

plt.figure(figsize=(15,5))
plt.subplot(121)
plt.gca().set_title('retinal photo')
plt.imshow(image_retina, cmap=plt.cm.gray)
plt.subplot(122)
plt.gca().set_title('inverted grayscale image')
plt.imshow(image_ridges, cmap=plt.cm.gray)
Out[106]:
<matplotlib.image.AxesImage at 0x1c47ff3610>

We find ridges in the inverted image by applying a 2nd order gaussian derivative filter. The second order derivative is called a Laplacian:

In [109]:
sigma=3
LoG_ridges = filters.laplace(filters.gaussian(image_ridges, sigma=sigma))

plt.imshow(LoG_ridges,cmap='gray')
plt.gca().set_title('retinal photo - response to LoG')
Out[109]:
Text(0.5, 1.0, 'retinal photo - response to LoG')

This commonly used ridge detector is called a Laplacian of Gaussian (LoG)!

(We'll get back to this image in a bit)

1.3 Masking

A mask is a binary image (0s and 1s) that typically separates a given input image into Foreground (interesting) and Background (boring) regions, or for picking a region-of-interest (ROI). A mask is applied to an image by element-wise multiplication. The size of a mask must be identical to the size of the image it's applied to.

Let's begin by creating a simple circular mask. We'll create an array where the value at each point is it's distance from the center of the image, and display it as an image:

In [78]:
import matplotlib.pyplot as plt
import numpy as np

#dimensions in x and y
y = 512
x = 512

#position of center
centY = np.ceil(y/2)
centX = np.ceil(x/2)

#create the grid
yy,xx = np.indices((y,x))

#create radial distance map
radialDist = np.sqrt((xx-centX)**2 + (yy - centY)**2)

#display
plt.gca().set_title('Radial distance')
plt.imshow(radialDist, cmap='gray', interpolation='nearest')
plt.show()

Of these points, we'll pick a circle of radius 100 and display it as an image:

In [79]:
circ1 = radialDist < 100

plt.show()
plt.gca().set_title('Circle with radius 100')
plt.imshow(circ1, cmap='inferno', interpolation='nearest')
Out[79]:
<matplotlib.image.AxesImage at 0x1c41ab35d0>

This object is a mask. If you multiply this matrix of 0s and 1s with an image of the same size, only the parts that are ==1 will remain

Let's apply this mask to our histology image.

In [80]:
from skimage import data
image_hist = data.immunohistochemistry()
image_hist = image_hist[:,:,2]

plt.gca().set_title('Masked first channel')
plt.imshow(image_hist*circ1, cmap=plt.cm.gray)
Out[80]:
<matplotlib.image.AxesImage at 0x1c42989090>

What happens if we invert the mask?

In [81]:
from skimage.util import invert

inverted_mask = invert(circ1)

plt.figure(figsize=(15,5))
plt.subplot(121)
plt.gca().set_title('inverted mask')
plt.imshow(inverted_mask, cmap=plt.cm.gray)
plt.subplot(122)
plt.gca().set_title('inverted masked image')
plt.imshow(image_hist*inverted_mask, cmap=plt.cm.gray)
Out[81]:
<matplotlib.image.AxesImage at 0x1c41a60b50>

Just for closure, let's see what happens when we look at the full RGB image and try to apply the mask

In [82]:
image = data.immunohistochemistry()
plt.imshow(image*circ1, cmap=plt.cm.gray)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-82-b5f0a8601462> in <module>
      1 image = data.immunohistochemistry()
----> 2 plt.imshow(image*circ1, cmap=plt.cm.gray)

ValueError: operands could not be broadcast together with shapes (512,512,3) (512,512) 

Whoops. Seems like something is wrong. Our problem is that numpy didn't know how to multiply a 512x512x3 with a 512x512 mask. Numpy makes solving this very easy by adding a singleton dimension (look up broadcasting in your spare time).

In [83]:
image = data.immunohistochemistry()
plt.gca().set_title('Masked image')
plt.imshow(image*np.expand_dims(circ1,2), cmap=plt.cm.gray)
Out[83]:
<matplotlib.image.AxesImage at 0x1c432ba810>

1.4 Thresholding

1.4.1 Simple thresholding

Thresholding an image is the process of setting an intensity (or intensities) for separating the different components of an image.

In simplest case, the foreground and background have very different intensities. In that case thresholding is just clustering pixels by their intensity levels.

In [84]:
#this function from skimage converts images of integer types into floats, which are easier to work with.
import matplotlib.pyplot as plt
import numpy as np
from skimage import img_as_float
from skimage import data

# First, let's create a noisy image of blobs
image_blobs = img_as_float(data.binary_blobs(length=512, seed=1))
sigma = 0.22
image_blobs += np.random.normal(loc=0, scale=sigma, size=image_blobs.shape)

print("The number of dimensions of the image is: ", image_blobs.ndim)
print("The size of the image is: ", image_blobs.shape)
plt.imshow(image_blobs, cmap=plt.cm.gray)
The number of dimensions of the image is:  2
The size of the image is:  (512, 512)
Out[84]:
<matplotlib.image.AxesImage at 0x1c433974d0>

To find the right threshold, let's examine a histogram of pixel intensity values

In [85]:
plt.hist(image_blobs.flatten(),bins=250)
plt.show()

Pick an appropriate threshold, by eye, and see if you can remove the background. What happens when you increase or decrease the threshold?

In [86]:
thresh = 0.5

mask = image_blobs>thresh
plt.figure(figsize=(15,5))
plt.subplot(131)
plt.gca().set_title('original')
plt.imshow(image_blobs, interpolation='nearest', cmap=plt.cm.gray)
plt.subplot(132)
plt.gca().set_title('mask')
plt.imshow(mask, interpolation='nearest', cmap=plt.cm.gray)
plt.subplot(133)
plt.gca().set_title('masked image')
plt.imshow(image_blobs*mask, interpolation='nearest', cmap=plt.cm.gray)
Out[86]:
<matplotlib.image.AxesImage at 0x1c43df3410>

Our mask looks ok, but it has a lot of salt & pepper speckle noise. Why is that?

We can try and use what we learned before about filtering to clean up our results. What filter should we use?

In [87]:
from skimage import filters

thresh = 0.5

mask = filters.gaussian(image_blobs, sigma=1)>thresh

plt.figure(figsize=(15,5))
plt.subplot(131)
plt.gca().set_title('original')
plt.imshow(image_blobs, interpolation='nearest', cmap=plt.cm.gray)
plt.subplot(132)
plt.gca().set_title('mask')
plt.imshow(mask, interpolation='nearest', cmap=plt.cm.gray)
plt.subplot(133)
plt.gca().set_title('masked image')
plt.imshow(image_blobs*mask, interpolation='nearest', cmap=plt.cm.gray)
Out[87]:
<matplotlib.image.AxesImage at 0x1c42cdb810>

It's usually a good idea before creating a mask to despeckle an image using a narrow gaussian filter!

1.4.2 Morphological operations

Morphology is a broad set of image processing operations that process images based on shapes. In a morphological operation, each pixel in the image is adjusted based on the value of other pixels in its neighborhood. By choosing the size and shape of the neighborhood, you can construct a morphological operation that is sensitive to specific shapes in the input image. (explanation from Mathworks)

Morphological operations are based around a structuring element, which is a small binary image, often of a disk or a square. The structuring element is positioned at all possible locations in the image and it is compared with the corresponding neighbourhood of pixels. Some operations test whether the element "fits" within the neighbourhood, while others test whether it "hits" or intersects the neighbourhood.

Common operations for image processing

Erosion - output image =1 wherever the structuring element fits (erodes the mask)

Dilation - output image =1 wherever the structuring element hits (expands the mask)

Opening - Erosion followed by dilation (opens gaps in spots where the mask is weakly connected)

Closing - Dilation followed by erosion (closes holes in the mask)

A very thorough explanation of morphological operationscould be found here

In [88]:
from skimage.morphology import erosion, dilation, opening, closing
from skimage.morphology import disk

#define a "disk" structuring element
selem = disk(10)

erosion_mask = erosion(mask, selem)
dilation_mask = dilation(mask, selem)
opening_mask = opening(mask, selem)
closing_mask = closing(mask, selem)

plt.figure(figsize=(15,10))
plt.subplot(231)
plt.gca().set_title('mask')
plt.imshow(mask, interpolation='nearest', cmap=plt.cm.gray)
plt.subplot(232)
plt.gca().set_title('erosion')
plt.imshow(erosion_mask, interpolation='nearest', cmap=plt.cm.gray)
plt.subplot(233)
plt.gca().set_title('dilation')
plt.imshow(dilation_mask, interpolation='nearest', cmap=plt.cm.gray)
plt.subplot(235)
plt.gca().set_title('opening')
plt.imshow(opening_mask, interpolation='nearest', cmap=plt.cm.gray)
plt.subplot(236)
plt.gca().set_title('closing')
plt.imshow(closing_mask, interpolation='nearest', cmap=plt.cm.gray)
Out[88]:
<matplotlib.image.AxesImage at 0x1c44bd6450>

1.4.3 Masking actual data

We'll repeat the thresholding procedure using an actual microscopy image of fluorescent nuclei

In [89]:
import matplotlib.pyplot as plt
import numpy as np
from skimage import img_as_float
import matplotlib.image as mpimg

#this is how we load an image from the hard drive
image_nuclei = img_as_float(mpimg.imread("../Data/xy040-1.png"))


fig = plt.figure(num=None, figsize=(7.1, 4.6), dpi=80, facecolor='w', edgecolor='k')
print("The number of dimensions of the image is: ", image_nuclei.ndim)
print("The size of the image is: ", image_nuclei.shape)
plt.imshow(image_nuclei, cmap=plt.cm.gray, vmin=0, vmax=0.01)
The number of dimensions of the image is:  2
The size of the image is:  (940, 1392)
Out[89]:
<matplotlib.image.AxesImage at 0x1c44ff83d0>

Again, let's plot a histogram of intensity values

In [90]:
plt.hist(image_nuclei.flatten(),bins=250)
plt.xlim((0, 0.02))
plt.show()

And again we'll pick a value by eye:

In [91]:
thresh = 0.003

#remember to despeckle before creating a mask!
mask = filters.gaussian(image_nuclei, sigma=1)>thresh
plt.figure(figsize=(8,15))
plt.subplot(311)
plt.gca().set_title('original')
plt.imshow(image_nuclei, interpolation='nearest', cmap=plt.cm.gray, vmin=0, vmax=0.01)
plt.subplot(312)
plt.gca().set_title('mask')
plt.imshow(mask, interpolation='nearest', cmap=plt.cm.gray)
plt.subplot(313)
plt.gca().set_title('masked image')
plt.imshow(image_nuclei*mask, interpolation='nearest', cmap=plt.cm.gray, vmin=0, vmax=0.01)
Out[91]:
<matplotlib.image.AxesImage at 0x1c469b68d0>

Not bad! But also not very scalable. If we have 100s of images we can't look at them one-by-one and find thresholds by eye. Next, we'll look at some methods for automatically finding the thresholds.

1.5 Automated threshold calculation

There is a very large list of algorithms for threshold calculation out there that are optimized for different situations. We will briefly review a few of the most common ones.

1.5.1 Iterative mean thresholding

Algorithm:

  1. Start with some threshold $T_i$
  2. Compute the means $m_0$ and $m_1$ of the BG and FG
  3. Update $T_{i+1} = \frac{m_0+m_1}{2}$
  4. Repeat until it converges

1.5.2 Otsu thresholding

The algorithm exhaustively searches for the threshold that minimizes the intra-class variance, defined for a given threshold $T$ as a weighted sum of variances of the two classes: $\sigma^2_w(T)=\omega_0(T)\sigma^2_0(T)+\omega_1(T)\sigma^2_1(T)$

For 2 classes, minimizing the intra-class variance is equivalent to maximizing inter-class variance, which is much easier to calculate: \begin{align} \sigma^2_b(T) & =\sigma^2-\sigma^2_w(T)=\omega_0(\mu_0-\mu_T)^2+\omega_1(\mu_1-\mu_T)^2 \\ & =\omega_0(T) \omega_1(T) \left[\mu_0(T)-\mu_1(T)\right]^2 \end{align}

1.5.3 Triangle thresholding

Algorithm:

  1. Draw a straight line between the histogram peak and the brightest value.
  2. From every point on that line, draw the shortest connecting line to the histogram.
  3. Find longest of these connecting lines.
  4. Threshold is set at the intersection of that line and the curve.

note: Triangle thresholding is good for situations where the image is mostly background, and there is no clear "peak" of bright pixels.

scikit-image's filters module implements a large variety of thresholding algorithms. Let's apply the ones we just learned about.

In [101]:
from skimage import filters

meanThresh = filters.threshold_mean(image_nuclei)
print(meanThresh)

OtsuThresh = filters.threshold_otsu(image_nuclei)
print(OtsuThresh)

TriThresh = filters.threshold_triangle(image_nuclei)
print(TriThresh)
0.0023766113
0.0040662913
0.0022590505

Let's look at the resulting masks we get with each of these thresholds

In [102]:
fig = plt.figure(num=None, figsize=(12, 8), dpi=80)
ax1 = fig.add_axes([0.1,0.6,0.4,0.4])

ax1.hist(image_nuclei.flatten(),bins=250)

ax1.axvline(meanThresh, color='g', linestyle='--')
ax1.axvline(OtsuThresh, color='r', linestyle='--')
ax1.axvline(TriThresh, color='k', linestyle='--')

ax1.legend(['mean' ,'otsu', 'triangle'])
ax1.set_title('histogram')

ax2 = fig.add_axes([0.6,0.6,0.4,0.4])
mask_mean = filters.gaussian(image_nuclei, sigma=1)>meanThresh

ax2.imshow(mask_mean)
ax2.set_title('Iterative mean')
ax2.set_axis_off()

ax2 = fig.add_axes([0.1,0.1,0.4,0.4])
mask_otsu = filters.gaussian(image_nuclei, sigma=1)>OtsuThresh

ax2.imshow(mask_otsu)
ax2.set_title('Otsu')
ax2.set_axis_off()

ax2 = fig.add_axes([0.6,0.1,0.4,0.4])
mask_tri = filters.gaussian(image_nuclei, sigma=1)>TriThresh

ax2.imshow(mask_tri)
ax2.set_title('Triangle')
ax2.set_axis_off()

Let's briefly look back at the retinal images and try to mask only the vasculature:

load the image and apply LoG filter:

In [103]:
import matplotlib.pyplot as plt
import numpy as np
from skimage import img_as_float
import matplotlib.image as mpimg
from skimage.color import rgb2gray

#this is how we load an image from the hard drive
image_ridges = rgb2gray(img_as_float(mpimg.imread("../Data/RetinalPhoto.png")))

plt.imshow(image_ridges, cmap=plt.cm.gray)
plt.gca().set_title('retinal photo')
sigma = 3
LoG_ridges = filters.laplace(filters.gaussian(invert(image_ridges), sigma=sigma))
ax2.set_axis_off()

Now, let's do the same procedure of automatically finding thresholds:

In [112]:
from skimage import filters

meanThresh = filters.threshold_mean(LoG_ridges)
print(meanThresh)

OtsuThresh = filters.threshold_otsu(LoG_ridges)
print(OtsuThresh)

TriThresh = filters.threshold_triangle(LoG_ridges)
print(TriThresh)

fig = plt.figure(num=None, figsize=(12, 8), dpi=80)
ax1 = fig.add_axes([0.1,0.6,0.4,0.4])

ax1.hist(image_nuclei.flatten(),bins=250)

ax1.axvline(meanThresh, color='g', linestyle='--')
ax1.axvline(OtsuThresh, color='r', linestyle='--')
ax1.axvline(TriThresh, color='k', linestyle='--')
ax1.set_title('histogram')

ax1 = fig.add_axes([0.2,0.65,0.3,0.3])

plt.imshow(image_ridges, cmap=plt.cm.gray)
ax1.set_axis_off()

ax1.legend(['mean' ,'otsu', 'triangle'])

ax2 = fig.add_axes([0.6,0.6,0.4,0.4])
mask_mean = LoG_ridges>meanThresh

ax2.imshow(mask_mean)
ax2.set_title('Iterative mean')
ax2.set_axis_off()

ax2 = fig.add_axes([0.1,0.1,0.4,0.4])
mask_otsu = LoG_ridges>OtsuThresh

ax2.imshow(mask_otsu)
ax2.set_title('Otsu')
ax2.set_axis_off()

ax2 = fig.add_axes([0.6,0.1,0.4,0.4])
mask_tri = LoG_ridges>TriThresh

ax2.imshow(mask_tri)
ax2.set_title('Triangle')
ax2.set_axis_off()
-5.361737e-13
0.002177109
0.000341843

Play around with the width of the gaussian. Which thresholding algorithm works best in this case?

1.5.4 Local thresholding

All of the methods we saw so far are global in the sense that the same threshold is applied to the whole picture. Sometimes we can have an image with vastly different intensity distributions at different locations. Using local thresholding, we can overcome such cases.

Let's compare the results from a global (Otsu) and a local threshold.

In [94]:
from skimage import data

image = data.page()
fig = plt.figure(num=None, figsize=(12, 8), dpi=80)

#global thresholding
threshGlobal = filters.threshold_otsu(image)

ax1 = fig.add_axes([0.1,0.6,0.4,0.4])
ax1.set_title('mask - Otsu threshold')
plt.imshow(image ,cmap='gray')

ax2 = fig.add_axes([0.6,0.6,0.4,0.4])
ax2.set_title('mask - Otsu threshold')
plt.imshow(image>threshGlobal,cmap='gray')

#local thresholding

#Try and change this number and see what happens
block_size  = 81

threshLocal = filters.threshold_local(image, block_size)

ax1 = fig.add_axes([0.1,0.2,0.4,0.4])
ax1.imshow(threshLocal,cmap='gray')
ax1.set_title('local threshold map')


ax2 = fig.add_axes([0.6,0.2,0.4,0.4])
ax2.set_title('mask - Local threshold')
plt.imshow(image>threshLocal,cmap='gray')
Out[94]:
<matplotlib.image.AxesImage at 0x1c467dff10>

2. Image segmentation

Image segmentation is the process of partitioning a digital image into multiple segments. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze.

2.1 Connected components

After we generate a mask, the simplest segmentation involves simply looking for regions in the mask that are connected and labeling each one as a separate object.

We begin by generating a simple mask using the triangle threshold method:

In [23]:
import matplotlib.pyplot as plt
import numpy as np
from skimage import img_as_float
import matplotlib.image as mpimg
from skimage import filters

image_nuclei = img_as_float(mpimg.imread("../Data/xy040-1.png"))
TriThresh = filters.threshold_triangle(image_nuclei)
#despeckle
mask = filters.gaussian(image_nuclei, sigma=1)>TriThresh

scikit-image's measure module implements a variety of useful methods for segmentation. the label function returns a labeled image of connected components (CCs). Each CC is uniquely numbered by an integer.

$\begin{bmatrix} 1 & 1 & 0 & 0 & 2\\ 1 & 1 & 0 & 2 & 2\\ 0 & 0 & 0 & 0 & 0\\ 0 & 3 & 0 & 4 & 4\\ 0 & 0 & 0 & 4 & 4\\ \end{bmatrix}$

In [24]:
from skimage import measure
labels = measure.label(mask)

plt.figure(figsize=(12,5))

plt.subplot(121)
plt.imshow(mask, cmap='gray')
plt.subplot(122)
plt.imshow(labels, cmap='nipy_spectral')
Out[24]:
<matplotlib.image.AxesImage at 0x1c2a30c250>

We can easily generate a mask for a specific CC using the binary operation labels==i

In [25]:
i=43
mask_of_CC_i = labels==i

plt.figure(figsize=(10,5))
plt.imshow(mask_of_CC_i, cmap='gray')
Out[25]:
<matplotlib.image.AxesImage at 0x1c29aa1910>

Problem with simple CC segmentation : overlapping objects

We often really care about having only a single object per label. Using CC, any overlapping objects will merge into one blob:

In [26]:
i=87
mask_of_CC_i = labels==i
plt.imshow(mask_of_CC_i, cmap='gray')
Out[26]:
<matplotlib.image.AxesImage at 0x1c29fdddd0>

These problems can be partially resolved using morphological operations, but there's no silver bullet

In [27]:
from skimage.morphology import erosion, dilation, opening, closing
from skimage.morphology import disk

#define a "disk" structuring element
selem1 = disk(10)
selem2 = disk(7)

plt.figure(figsize=(15,10))
plt.subplot(121)
plt.gca().set_title('original')
plt.imshow(mask, cmap='nipy_spectral')
plt.subplot(122)
plt.gca().set_title('opening')
plt.imshow(dilation(erosion(mask, selem1),selem2), interpolation='nearest', cmap='nipy_spectral')
Out[27]:
<matplotlib.image.AxesImage at 0x1c29415790>

2.2 Watershed based segmentation

2.2.1 The watershed algorithm

The watershed transformation treats the image it operates upon like a topographic map, with the brightness of each point representing its height, and finds the lines that run along the tops of ridges.

More precisely, the algorithm goes as follows:

  1. Label local minima (i.e. $S_1$, $S_2$)
  2. Move to next higher intensity level
  3. Assign to each point the label of it's closest label set. Points equidistant to multiple sets are labeled as boundaries and intensities set to 0
  4. Repeat until all points are labeled

Let's start with a very naive application. We will invert the image, and then simply apply the watershed function from the scikit-image morphology module. The function returns a labeled image.

In [28]:
import matplotlib.pyplot as plt
import numpy as np
from skimage import img_as_float
import matplotlib.image as mpimg
from skimage import filters

from skimage.util import invert
from skimage.morphology import watershed


image_nuclei = img_as_float(mpimg.imread("../Data/xy040-1.png"))


#invert image
image_to_watershed = invert(image_nuclei)

#Calculate watershed transform
labels_naive = watershed(image_to_watershed, watershed_line = 1)
plt.figure(figsize=(15,10))

#let's look at all the boundaries
plt.figure(figsize=(12,20))
plt.subplot(211)
plt.gca().set_title('image fed to watershed')
plt.imshow(image_to_watershed, cmap='gray')
plt.subplot(212)
plt.gca().set_title('watershed result')
plt.imshow(filters.sobel(labels_naive), cmap='nipy_spectral')
Out[28]:
<matplotlib.image.AxesImage at 0x1c2cbd1d50>
<Figure size 1080x720 with 0 Axes>

So this clearly didn't work. Why? How do we fix it?

Noise generates a ton of local minima. Each gets its own basin. This leads to massive oversegmentation.

Watershed segmentation is only a part of a segmentation pipeline. Preprocessing (denoising, smoothing, seeding minima) of the image is CRUCIAL for it to work well.

The first thing we'll do is to apply the mask that we found before. This is simply done by adding a mask argument to the watershed function.

In [24]:
import matplotlib.pyplot as plt
import numpy as np
from skimage import img_as_float
import matplotlib.image as mpimg
from skimage import filters

from skimage.util import invert
from skimage.morphology import watershed

image_nuclei = img_as_float(mpimg.imread("../Data/xy040-1.png"))

TriThresh = filters.threshold_triangle(image_nuclei)

mask = filters.gaussian(image_nuclei, sigma=1)>TriThresh

#mask and invert image
image_to_watershed = image_nuclei*mask
image_to_watershed = invert(image_to_watershed)

#Calculate watershed transform
#pass the mask to the watershed function so it avoids segmenting the BG
labels_masked = watershed(image_to_watershed, watershed_line = 1, mask=mask)

#let's look at all the boundaries
plt.figure(figsize=(12,20))
plt.subplot(211)
plt.gca().set_title('image fed to watershed')
plt.imshow(image_to_watershed, cmap='gray')
plt.subplot(212)
plt.gca().set_title('watershed result')
plt.imshow(filters.sobel(labels_masked), cmap='nipy_spectral')
Out[24]:
<matplotlib.image.AxesImage at 0x1c22f6b9d0>

So we got rid of all the BG regions, but we are still oversegmenting. Why?

Let's try to smoothen the image and get rid of the many local minima. How wide should the gaussian kernel be?

In [25]:
import matplotlib.pyplot as plt
import numpy as np
from skimage import img_as_float
import matplotlib.image as mpimg
from skimage import filters

from skimage.util import invert
from skimage.morphology import watershed, thin

image_nuclei = img_as_float(mpimg.imread("../Data/xy040-1.png"))

TriThresh = filters.threshold_triangle(image_nuclei)

mask = filters.gaussian(image_nuclei, sigma=1)>TriThresh

#mask, smooth, and invert the image
sigma_for_smoothing = 4
image_to_watershed = image_nuclei*mask
image_to_watershed = filters.gaussian(image_to_watershed, sigma=sigma_for_smoothing)
image_to_watershed = invert(image_to_watershed)

#Calculate watershed transform
#pass the mask to the watershed function so it avoids segmenting the BG
labels_masked_smooth = watershed(image_to_watershed, watershed_line = 1, mask=mask)

#let's look at all the boundaries
plt.figure(figsize=(12,20))
plt.subplot(211)
plt.gca().set_title('image fed to watershed')
plt.imshow(image_to_watershed, cmap='gray')
plt.subplot(212)
plt.gca().set_title('watershed result')
plt.imshow(filters.sobel(labels_masked_smooth), cmap='nipy_spectral')
Out[25]:
<matplotlib.image.AxesImage at 0x1c29e950d0>

We're starting to get somewhere!! Can we do better?

We can do more to help the algorithm by providing local markers (seeds) from which to start the process

We will find seeds by calculating local maxima over areas that are larger than 30x30 pixels using the footprint argument for the function peak_local_max

In [26]:
import matplotlib.pyplot as plt
import numpy as np
from skimage import img_as_float
import matplotlib.image as mpimg
from skimage import filters
from skimage import measure
from skimage.morphology import disk
from skimage.morphology import erosion, dilation, opening, closing

from skimage.util import invert
from skimage.morphology import watershed

from skimage.feature import peak_local_max

image_nuclei = img_as_float(mpimg.imread("../Data/xy040-1.png"))


mask = filters.gaussian(image_nuclei, sigma=1)>TriThresh


#mask, smooth, and invert the image
sigma_for_smoothing = 3
image_to_watershed = image_nuclei*mask
image_to_watershed = filters.gaussian(image_to_watershed, sigma=sigma_for_smoothing)
image_to_watershed = invert(image_to_watershed)

#find local peaks to use as seeds
MaskedImagePeaks = peak_local_max(invert(image_to_watershed), footprint=np.ones((30, 30)), threshold_abs=None, threshold_rel=None, exclude_border=False, indices=False).astype('int')

#create disk structuring element of radius 5
selem = disk(5)
#dilate local peaks so that close ones merge
peakMask = dilation(MaskedImagePeaks,selem)
# label local peak regions to find initial markers
markers = measure.label(peakMask)


#pass the *markers* argument to the watershed function
labels_localmax_markers = watershed(image_to_watershed,markers, watershed_line = 1, mask=mask)

#let's look at all the boundaries
plt.figure(figsize=(12,20))
plt.subplot(211)
plt.gca().set_title('image fed to watershed')
plt.imshow(image_to_watershed-peakMask, cmap='gray')
plt.clim((0.95, 1))
plt.subplot(212)
plt.gca().set_title('watershed result')
plt.imshow(filters.sobel(labels_localmax_markers), cmap='nipy_spectral')
Out[26]:
<matplotlib.image.AxesImage at 0x1c22f6bd50>

This is pretty good! We're still getting a few errors here and there, but there's no big systematic over- or under- segmentation. This is a typical good result when dealing with real data.

3. Feature extraction

Feature extraction is a process of dimensionality reduction by which an initial raw image is reduced to a list of objects and attributes

3.1 Extracting region properties

scikit-image's measure module implements a method called regionprops that accepts a labeled mask of connected components, and, optionally, a corresponding image, and returns a list. Each object on the list contains useful data about the size, shape, position, and intensity (see the full list here) of a specific component.

The length of the list is equal to the total number of objects detected.

We'll start by extracting the number of CC we found and the area of each CC

In [6]:
from skimage import measure

#We use regionprops 
props = measure.regionprops(labels_localmax_markers,image_nuclei)
#how many total connected components did we get?
print(len(props))


props[1].perimeter
229
Out[6]:
115.97665940288701
In [7]:
areas = [r.area for r in props]
intensities = [r.mean_intensity for r in props]

#let's look at all the boundaries
plt.figure(figsize=(12,6))
plt.subplot(121)
plt.gca().set_title('areas')
plt.hist(areas)
plt.subplot(122)
plt.gca().set_title('intensities')
plt.hist(intensities)
Out[7]:
(array([11., 31., 37., 59., 43., 25., 16.,  6.,  0.,  1.]),
 array([0.00341783, 0.00418932, 0.0049608 , 0.00573229, 0.00650378,
        0.00727526, 0.00804675, 0.00881824, 0.00958972, 0.01036121,
        0.01113269], dtype=float32),
 <a list of 10 Patch objects>)

We can look at indivudual object found

In [9]:
i=10
plt.imshow(props[i].intensity_image)
plt.gca().set_title('Single cell')
Out[9]:
Text(0.5, 1.0, 'Single cell')

Let's use a scatter plot to compare our results to the image

In [10]:
intensities = np.array([r.mean_intensity for r in props])
centroids = np.array([r.centroid for r in props])


fig = plt.figure(figsize=(12,15))
fig.add_axes([0.1,0.6,0.4,0.25])
plt.gca().set_title('original')
plt.imshow(image_nuclei, interpolation='nearest', cmap=plt.cm.gray, vmin=0, vmax=0.02)
fig.add_axes([0.6,0.6,0.4,0.25])
plt.scatter(centroids[:,1],centroids[:,0], c=intensities)
plt.axis('equal')
plt.gca().invert_yaxis()

Or even nicer scatter plots!

In [11]:
intensities = np.array([r.mean_intensity for r in props])
areas = np.array([r.area for r in props])

centroids = np.array([r.centroid for r in props])


fig = plt.figure(figsize=(12,15))
fig.add_axes([0.1,0.6,0.4,0.25])
plt.gca().set_title('original')
plt.imshow(image_nuclei, interpolation='nearest', cmap=plt.cm.gray, vmin=0, vmax=0.02)
fig.add_axes([0.6,0.6,0.4,0.25])
plt.scatter(centroids[:,1],centroids[:,0], c=intensities, s=areas/20)
plt.axis('equal')
plt.gca().invert_yaxis()

plt.text(centroids[10,1],centroids[10,0],props[10].label)
Out[11]:
Text(1367.2072186836517, 36.66751592356688, '11')

You can even draw your points directly on the image!

In [12]:
intensities = np.array([r.mean_intensity for r in props])
areas = np.array([r.area for r in props])

centroids = np.array([r.centroid for r in props])


fig = plt.figure(figsize=(12,15))
fig.add_axes([0.1,0.6,0.8,0.5])
plt.gca().set_title('original')
plt.imshow(image_nuclei, interpolation='nearest', cmap=plt.cm.gray, vmin=0, vmax=0.02)
plt.gca().patch.set_alpha(0.5)
plt.scatter(centroids[:,1],centroids[:,0], c=intensities, alpha=1)
Out[12]:
<matplotlib.collections.PathCollection at 0x1c28c4a650>

Let's define some useful functions for converting a list of props into a pandas dataframe. These should become obsolete soon since the new version of scikit-image will have this functionality

In [13]:
import pandas as pd

def scalar_attributes_list(im_props):
    """
    Makes list of all scalar, non-dunder, non-hidden
    attributes of skimage.measure.regionprops object
    """
    
    attributes_list = []
    
    for i, test_attribute in enumerate(dir(im_props[0])):
        
        #Attribute should not start with _ and cannot return an array
        #does not yet return tuples
        if test_attribute[:1] != '_' and not\
                isinstance(getattr(im_props[0], test_attribute), np.ndarray):                
            attributes_list += [test_attribute]
            
    return attributes_list


def regionprops_to_df(im_props):
    """
    Read content of all attributes for every item in a list
    output by skimage.measure.regionprops
    """

    attributes_list = scalar_attributes_list(im_props)

    # Initialise list of lists for parsed data
    parsed_data = []

    # Put data from im_props into list of lists
    for i, _ in enumerate(im_props):
        parsed_data += [[]]
        
        for j in range(len(attributes_list)):
            parsed_data[i] += [getattr(im_props[i], attributes_list[j])]

    # Return as a Pandas DataFrame
    return pd.DataFrame(parsed_data, columns=attributes_list)

Now, to get all the properties in table/dataframe form we simply run:

In [14]:
props_df = regionprops_to_df(props)
props_df
/Users/alon/opt/anaconda3/lib/python3.7/site-packages/skimage/measure/_regionprops.py:250: UserWarning: regionprops and image moments (including moments, normalized moments, central moments, and inertia tensor) of 2D images will change from xy coordinates to rc coordinates in version 0.16.
See https://scikit-image.org/docs/0.14.x/release_notes_and_installation.html#deprecations for details on how to avoid this message.
  warn(XY_TO_RC_DEPRECATION_MESSAGE)
/Users/alon/opt/anaconda3/lib/python3.7/site-packages/skimage/measure/_regionprops.py:260: UserWarning: regionprops and image moments (including moments, normalized moments, central moments, and inertia tensor) of 2D images will change from xy coordinates to rc coordinates in version 0.16.
See https://scikit-image.org/docs/0.14.x/release_notes_and_installation.html#deprecations for details on how to avoid this message.
  warn(XY_TO_RC_DEPRECATION_MESSAGE)
Out[14]:
area bbox bbox_area centroid convex_area eccentricity equivalent_diameter euler_number extent filled_area ... major_axis_length max_intensity mean_intensity min_intensity minor_axis_length orientation perimeter slice solidity weighted_centroid
0 1034 (0, 152, 27, 202) 1350 (11.19439071566731, 176.13056092843325) 1063 0.803082 36.284014 1 0.765926 1034 ... 48.361710 0.012329 0.006054 0.002090 28.817252 0.095973 134.219300 (slice(0, 27, None), slice(152, 202, None)) 0.972719 (9.471141254673604, 177.62865715399982)
1 737 (0, 453, 22, 499) 1012 (8.40841248303935, 477.9036635006784) 748 0.873175 30.632949 1 0.728261 737 ... 45.291488 0.011719 0.006293 0.002029 22.075411 -0.096784 115.976659 (slice(0, 22, None), slice(453, 499, None)) 0.985294 (7.391350978747608, 477.89791476736195)
2 150 (0, 607, 7, 637) 210 (2.3066666666666666, 621.5066666666667) 156 0.968575 13.819766 1 0.714286 150 ... 28.818028 0.005188 0.003736 0.002182 7.167693 0.005792 62.349242 (slice(0, 7, None), slice(607, 637, None)) 0.961538 (2.086598938393352, 621.3798371405671)
3 1897 (0, 720, 36, 785) 2340 (15.205060622034791, 750.3494992092778) 1917 0.823359 49.146062 1 0.810684 1897 ... 66.525245 0.009247 0.005963 0.001968 37.754509 0.087716 175.154329 (slice(0, 36, None), slice(720, 785, None)) 0.989567 (14.306770283648738, 750.1989996495538)
4 537 (0, 1043, 23, 1084) 943 (7.527001862197393, 1059.1340782122904) 566 0.834170 26.148224 1 0.569459 537 ... 38.057977 0.013672 0.008977 0.001862 20.989282 0.209852 104.083261 (slice(0, 23, None), slice(1043, 1084, None)) 0.948763 (6.571785374039736, 1059.760904327393)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
224 315 (929, 2, 940, 40) 418 (934.9904761904762, 22.006349206349206) 326 0.956888 20.026744 1 0.753589 315 ... 38.293356 0.006836 0.004418 0.002197 11.122592 0.050513 85.213203 (slice(929, 940, None), slice(2, 40, None)) 0.966258 (935.2524699935746, 21.978047055834875)
225 825 (919, 104, 940, 153) 1029 (930.540606060606, 128.49818181818182) 847 0.897331 32.410224 1 0.801749 825 ... 49.972778 0.006332 0.004310 0.002075 22.055891 0.013776 122.526912 (slice(919, 940, None), slice(104, 153, None)) 0.974026 (931.3903367610546, 128.1929432883285)
226 693 (921, 316, 940, 368) 988 (932.01443001443, 343.3679653679654) 716 0.924588 29.704461 1 0.701417 693 ... 50.175144 0.009720 0.005787 0.001877 19.115116 0.051603 121.597980 (slice(921, 940, None), slice(316, 368, None)) 0.967877 (932.6920905094222, 343.974384656583)
227 670 (921, 654, 940, 706) 988 (932.110447761194, 678.4492537313433) 689 0.921987 29.207371 1 0.678138 670 ... 49.244464 0.010636 0.006345 0.001938 19.068457 -0.055730 120.426407 (slice(921, 940, None), slice(654, 706, None)) 0.972424 (933.012304602244, 678.4888099188446)
228 732 (917, 1063, 940, 1103) 920 (929.3989071038251, 1082.3265027322404) 751 0.794461 30.528861 1 0.795652 732 ... 39.817140 0.015244 0.006733 0.001984 24.181563 0.031461 106.769553 (slice(917, 940, None), slice(1063, 1103, None)) 0.974700 (930.9205465545213, 1081.3355492352161)

229 rows × 23 columns

Finally, if we imaged our cells in multiple channels, we would want use the same segmented nuclei and measure intensities of other channels.

In [40]:
from skimage import measure
from skimage import img_as_float
import matplotlib.image as mpimg

#load another channel
image_2ndChannel = img_as_float(mpimg.imread("../Data/xy040-2.png"))

# extract regionprops using labels_localmax_markers mask from image_2ndChannel
props_other_channel = measure.regionprops(labels_localmax_markers,image_2ndChannel)


plt.figure(figsize=(12,6))
plt.subplot(121)
plt.gca().set_title('Nuclei')
plt.imshow(image_nuclei, cmap='gray')
plt.subplot(122)
plt.gca().set_title('other channel')
plt.imshow(image_2ndChannel, cmap='gray')
Out[40]:
<matplotlib.image.AxesImage at 0x1c34a9d810>
In [ ]:
mean_2nd_channel = [r.mean_intensity for r in props_other_channel]
max_2nd_channel = [r.max_intensity for r in props_other_channel]
min_2nd_channel = [r.min_intensity for r in props_other_channel]

Add these new features to the pandas dataframe

In [17]:
props_df['mean_intensity_ch2'] = mean_2nd_channel
props_df['max_intensity_ch2'] = max_2nd_channel
props_df['min_intensity_ch2'] = min_2nd_channel

props_df
Out[17]:
area bbox bbox_area centroid convex_area eccentricity equivalent_diameter euler_number extent filled_area ... min_intensity minor_axis_length orientation perimeter slice solidity weighted_centroid mean_intensity_ch2 max_intensity_ch2 min_intensity_ch2
0 1034 (0, 152, 27, 202) 1350 (11.19439071566731, 176.13056092843325) 1063 0.803082 36.284014 1 0.765926 1034 ... 0.002090 28.817252 0.095973 134.219300 (slice(0, 27, None), slice(152, 202, None)) 0.972719 (9.471141254673604, 177.62865715399982) 0.000149 0.000397 0.000000
1 737 (0, 453, 22, 499) 1012 (8.40841248303935, 477.9036635006784) 748 0.873175 30.632949 1 0.728261 737 ... 0.002029 22.075411 -0.096784 115.976659 (slice(0, 22, None), slice(453, 499, None)) 0.985294 (7.391350978747608, 477.89791476736195) 0.000111 0.000336 0.000000
2 150 (0, 607, 7, 637) 210 (2.3066666666666666, 621.5066666666667) 156 0.968575 13.819766 1 0.714286 150 ... 0.002182 7.167693 0.005792 62.349242 (slice(0, 7, None), slice(607, 637, None)) 0.961538 (2.086598938393352, 621.3798371405671) 0.002449 0.004120 0.000824
3 1897 (0, 720, 36, 785) 2340 (15.205060622034791, 750.3494992092778) 1917 0.823359 49.146062 1 0.810684 1897 ... 0.001968 37.754509 0.087716 175.154329 (slice(0, 36, None), slice(720, 785, None)) 0.989567 (14.306770283648738, 750.1989996495538) 0.000127 0.000366 0.000000
4 537 (0, 1043, 23, 1084) 943 (7.527001862197393, 1059.1340782122904) 566 0.834170 26.148224 1 0.569459 537 ... 0.001862 20.989282 0.209852 104.083261 (slice(0, 23, None), slice(1043, 1084, None)) 0.948763 (6.571785374039736, 1059.760904327393) 0.000121 0.000351 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
224 315 (929, 2, 940, 40) 418 (934.9904761904762, 22.006349206349206) 326 0.956888 20.026744 1 0.753589 315 ... 0.002197 11.122592 0.050513 85.213203 (slice(929, 940, None), slice(2, 40, None)) 0.966258 (935.2524699935746, 21.978047055834875) 0.000605 0.002686 0.000000
225 825 (919, 104, 940, 153) 1029 (930.540606060606, 128.49818181818182) 847 0.897331 32.410224 1 0.801749 825 ... 0.002075 22.055891 0.013776 122.526912 (slice(919, 940, None), slice(104, 153, None)) 0.974026 (931.3903367610546, 128.1929432883285) 0.000090 0.000351 0.000000
226 693 (921, 316, 940, 368) 988 (932.01443001443, 343.3679653679654) 716 0.924588 29.704461 1 0.701417 693 ... 0.001877 19.115116 0.051603 121.597980 (slice(921, 940, None), slice(316, 368, None)) 0.967877 (932.6920905094222, 343.974384656583) 0.000073 0.000397 0.000000
227 670 (921, 654, 940, 706) 988 (932.110447761194, 678.4492537313433) 689 0.921987 29.207371 1 0.678138 670 ... 0.001938 19.068457 -0.055730 120.426407 (slice(921, 940, None), slice(654, 706, None)) 0.972424 (933.012304602244, 678.4888099188446) 0.003560 0.006348 0.000778
228 732 (917, 1063, 940, 1103) 920 (929.3989071038251, 1082.3265027322404) 751 0.794461 30.528861 1 0.795652 732 ... 0.001984 24.181563 0.031461 106.769553 (slice(917, 940, None), slice(1063, 1103, None)) 0.974700 (930.9205465545213, 1081.3355492352161) 0.000136 0.000336 0.000000

229 rows × 26 columns

Sometimes it's easier to see a bimodal distribution in log scale

In [56]:
fig = plt.figure(figsize=(12,6))
fig.add_axes([0.1,0.1,0.4,0.4])
plt.gca().set_title('Histogram of intensities')
plt.hist(props_df.mean_intensity_ch2,20)
fig.add_axes([0.6,0.1,0.4,0.4])
plt.gca().set_title('Histogram of log of intensities')

plt.hist(np.log(props_df.mean_intensity_ch2),20)
Out[56]:
(array([ 1., 13., 47., 40., 22., 14.,  5.,  3.,  1.,  4.,  2.,  3.,  3.,
         3., 10.,  8., 14., 15., 16.,  5.]),
 array([-9.80940577, -9.56936779, -9.32932981, -9.08929183, -8.84925384,
        -8.60921586, -8.36917788, -8.1291399 , -7.88910191, -7.64906393,
        -7.40902595, -7.16898797, -6.92894999, -6.688912  , -6.44887402,
        -6.20883604, -5.96879806, -5.72876007, -5.48872209, -5.24868411,
        -5.00864613]),
 <a list of 20 Patch objects>)

We can compare distributions of different channels

In [57]:
plt.figure(figsize=(12,6))

plt.gca().set_title('scatter plot of intensities')
plt.scatter(np.log(props_df['max_intensity_ch2']), np.log(props_df['max_intensity']))
plt.xlabel('Ch2')
plt.ylabel('Ch1')
Out[57]:
Text(0, 0.5, 'Ch1')

And so, we've successfully implemented a simple image segmentation pipeline for multicolor microscopy data.

Fin.